Goto

Collaborating Authors

 consumption rate


TokenFlow: Responsive LLM Text Streaming Serving under Request Burst via Preemptive Scheduling

Chen, Junyi, Du, Chuheng, Liu, Renyuan, Yao, Shuochao, Yan, Dingtian, Liao, Jiang, Liu, Shengzhong, Wu, Fan, Chen, Guihai

arXiv.org Artificial Intelligence

Real-time LLM interactions demand streamed token generations, where text tokens are progressively generated and delivered to users while balancing two objectives: responsiveness (i.e., low time-to-first-token) and steady generation (i.e.,required time-between-tokens). Standard LLM serving systems suffer from the inflexibility caused by non-preemptive request scheduling and reactive memory management, leading to poor resource utilization and low request processing parallelism under request bursts. Therefore, we present TokenFlow, a novel LLM serving system with enhanced text streaming performance via preemptive request scheduling and proactive key-value (KV) cache management. TokenFlow dynamically prioritizes requests based on real-time token buffer occupancy and token consumption rate, while actively transferring KV cache between GPU and CPU memory in the background and overlapping I/O with computation to minimize request preemption overhead. Extensive experiments on Llama3-8B and Qwen2.5-32B across multiple GPUs (RTX 4090, A6000, H200) demonstrate that TokenFlow achieves up to 82.5% higher effective throughput (accounting for actual user consumption) while reducing P99 TTFT by up to 80.2%, without degrading overall token throughput.


Cellular Plasticity Model for Bottom-Up Robotic Design

Smith, Trevor R., Smith, Thomas J., Szczecinski, Nicholas S., Yakovenko, Sergiy, Gu, Yu

arXiv.org Artificial Intelligence

Traditional top-down robotic design often lacks the adaptability needed to handle real-world complexities, prompting the need for more flexible approaches. Therefore, this study introduces a novel cellular plasticity model tailored for bottom-up robotic design. The proposed model utilizes an activator-inhibitor reaction, a common foundation of Turing patterns, which are fundamental in morphogenesis -- the emergence of form from simple interactions. Turing patterns describe how diffusion and interactions between two chemical substances-an activator and an inhibitor-can lead to complex patterns and structures, such as the formation of limbs and feathers. Our study extends this concept by modeling cellular plasticity as an activator-inhibitor reaction augmented with environmental stimuli, encapsulating the core phenomena observed across various cell types: stem cells, neurons, and muscle cells. In addition to demonstrating self-regulation and self-containment, this approach ensures that a robot's form and function are direct emergent responses to its environment without a comprehensive environmental model. In the proposed model, a factory acts as the activator, producing a product that serves as the inhibitor, which is then influenced by environmental stimuli through consumption. These components are regulated by cellular plasticity phenomena as feedback loops. We calculate the equilibrium points of the model and the stability criterion. Simulations examine how varying parameters affect the system's transient behavior and the impact of competing functions on its functional capacity. Results show the model converges to a single stable equilibrium tuned to the environmental stimulation. Such dynamic behavior underscores the model's utility for generating predictable responses within robotics and biological systems, showcasing its potential for navigating the complexities of adaptive systems.


AI-based Predictive Analytic Approaches for safeguarding the Future of Electric/Hybrid Vehicles

Bangroo, Ishan Shivansh

arXiv.org Artificial Intelligence

In response to the global need for sustainable energy, green technology may help fight climate change. Before green infrastructure to be easily integrated into the world's energy system, it needs upgrading. By improving energy infrastructure and decision-making, artificial intelligence (AI) may help solve this challenge. EHVs have grown in popularity because to concerns about global warming and the need for more ecologically friendly transportation. EHVs may work better with cutting-edge technologies like AI. Electric vehicles (EVs) reduce greenhouse gas emissions and promote sustainable mobility. Electric automobiles (EVs) are growing in popularity due to their benefits for climate change mitigation and sustainable mobility. Unfortunately, EV production consumes a lot of energy and materials, which may harm nature. EV production is being improved using green technologies like artificial intelligence and predictive analysis. Electric and hybrid vehicles (EHVs) may help meet the need for ecologically friendly transportation. However, the Battery Management System (BMS) controls EHV performance and longevity. AI may improve EHV energy efficiency, emissions reduction, and sustainability. Remote hijacking, security breaches, and unauthorized access are EHV cybersecurity vulnerabilities addressed in the article. AI research and development may help make transportation more sustainable, as may optimizing EHVs and charging infrastructure.


Online Regenerative Learning

Shen, Owen

arXiv.org Artificial Intelligence

We study a type of Online Linear Programming (OLP) problem that maximizes the objective function with stochastic inputs. The performance of various algorithms that analyze this type of OLP is well studied when the stochastic inputs follow some i.i.d distribution. The two central questions to ask are: (i) can the algorithms achieve the same efficiency if the stochastic inputs are not i.i.d but still stationary, and (ii) how can we modify our algorithms if we know the stochastic inputs are trendy, hence not stationary. We answer the first question by analyzing a regenerative type of input and show the regrets of two popular algorithms are bounded by the same orders as their i.i.d counterparts. We discuss the second question in the context of linearly growing inputs and propose a trend-adaptive algorithm. We provide numerical simulations to illustrate the performance of our algorithms under both regenerative and trendy inputs.